Parallel computing

LLMs

The sheer scale of LLMs means we can train them only through parallelism.

See Picotron for learning 4D-parallelism (Data, Tensor, Pipeline, Context parallel).